adaptive approach
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Policy evaluation via Monte Carlo (MC) simulation is at the core of many MC Reinforcement Learning (RL) algorithms (e.g., policy gradient methods). In this context, the designer of the learning system specifies an interaction budget that the agent usually spends by collecting trajectories of within a simulator. However, is this data collection strategy the best option? To answer this question, in this paper, we consider as quality index the variance of an unbiased policy return estimator that uses trajectories of different lengths, i.e., . We first derive a closed-form expression of this variance that clearly shows the sub-optimality of the fixed-length trajectory schedule. Furthermore, it suggests that adaptive data collection strategies that spend the available budget sequentially might be able to allocate a larger portion of transitions in timesteps in which more accurate sampling is required to reduce the variance of the final estimate.
An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by V_T, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by S_T, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Policy evaluation via Monte Carlo (MC) simulation is at the core of many MC Reinforcement Learning (RL) algorithms (e.g., policy gradient methods). In this context, the designer of the learning system specifies an interaction budget that the agent usually spends by collecting trajectories of fixed length within a simulator. However, is this data collection strategy the best option? To answer this question, in this paper, we consider as quality index the variance of an unbiased policy return estimator that uses trajectories of different lengths, i.e., truncated. We first derive a closed-form expression of this variance that clearly shows the sub-optimality of the fixed-length trajectory schedule.
WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs
Xie, Weijian, Liang, Xuefeng, Liu, Yuhui, Ni, Kaihua, Cheng, Hong, Hu, Zetian
Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce "phantom" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions.
An adaptive approach to Bayesian Optimization with switching costs
Pricopie, Stefan, Allmendinger, Richard, Lopez-Ibanez, Manuel, Fare, Clyde, Benatan, Matt, Knowles, Joshua
We investigate modifications to Bayesian Optimization for a resource-constrained setting of sequential experimental design where changes to certain design variables of the search space incur a switching cost. This models the scenario where there is a trade-off between evaluating more while maintaining the same setup, or switching and restricting the number of possible evaluations due to the incurred cost. We adapt two process-constrained batch algorithms to this sequential problem formulation, and propose two new methods -- one cost-aware and one costignorant. We validate and compare the algorithms using a set of 7 scalable test functions in different dimensionalities and switching-cost settings for 30 total configurations. Our proposed cost-aware hyperparameterfree algorithm yields comparable results to tuned process-constrained algorithms in all settings we considered, suggesting some degree of robustness to varying landscape features and cost trade-offs. This method starts to outperform the other algorithms with increasing switching-cost. Our work broadens out from other recent Bayesian Optimization studies in resource-constrained settings that consider a batch setting only. While the contributions of this work are relevant to the general class of resourceconstrained problems, they are particularly relevant to problems where adaptability to varying resource availability is of high importance.
Sentiment analysis with adaptive multi-head attention in Transformer
We propose a novel framework based on the attention mechanism to identify the sentiment of a movie review document. Previous efforts on deep neural networks with attention mechanisms focus on encoder and decoder with fixed numbers of multi-head attention. Therefore, we need a mechanism to stop the attention process automatically if no more useful information can be read from the memory.In this paper, we propose an adaptive multi-head attention architecture (AdaptAttn) which varies the number of attention heads based on length of sentences. AdaptAttn has a data preprocessing step where each document is classified into any one of the three bins small, medium or large based on length of the sentence. The document classified as small goes through two heads in each layer, the medium group passes four heads and the large group is processed by eight heads. We examine the merit of our model on the Stanford large movie review dataset. The experimental results show that the F1 score from our model is on par with the baseline model.
Auditory cueing strategy for stride length and cadence modification: a feasibility study with healthy adults
Wu, Tina LY, Murphy, Anna, Chen, Chao, Kulic, Dana
People with Parkinson's Disease experience gait impairments that significantly impact their quality of life. Visual, auditory, and tactile cues can alleviate gait impairments, but they can become less effective due to the progressive nature of the disease and changes in people's motor capability. In this study, we develop a human-in-the-loop (HIL) framework that monitors two key gait parameters, stride length and cadence, and continuously learns a person-specific model of how the parameters change in response to the feedback. The model is then used in an optimization algorithm to improve the gait parameters. This feasibility study examines whether auditory cues can be used to influence stride length in people without gait impairments. The results demonstrate the benefits of the HIL framework in maintaining people's stride length in the presence of a secondary task.
Cognitive Modeling of Semantic Fluency Using Transformers
Nighojkar, Animesh, Khlyzova, Anna, Licato, John
Two of the most important ideas underpinning contemporary cognitive science-and the closely related AI subfield of computational cognitive modeling-are the suppositions that the human mind uses cognitive structures and that progress in understanding the mind can come from modeling those structures and the algorithms which operate on them. The semantic fluency task (SFT), sometimes called the verbal fluency task Welsh et al. [1991], is commonly employed in service of those goals. In SFT, participants name as many items belonging to a particular semantic category (animals, fruits, etc.) as they can in a fixed amount of time (typically 40-180 seconds). Despite this task's simplicity, the lists generated by participants (which we call semantic fluency lists or SFLs) offer insights into the structure of human knowledge and the heuristics used for memory retrieval. For example, words sharing semantic features tend to group in clusters, and there is often a temporal delay before a participant switches from one cluster to another. Multiple approaches to computationally modeling behaviors in SFT have been proposed Hills et al. [2012], Abbott et al. [2015], Zemla et al. [2016], Zemla and Austerweil [2017], Avery and Jones [2018], most relying on graph-based representations in which words are represented as nodes, and edges correspond to some meaningful semantic relationship between the nodes. However, to date, no work has explored whether transformer-based language models (TLMs) can be any better at modeling the generation of SFLs. And there are multiple reasons, at least from an exploratory perspective, to suspect TLMs might do well in this regard, e.g.: (1) a large body of literature demonstrates why semantic memory can not be sufficiently represented purely by fixed associative links between lexical nodes--at minimum, representations must allow for dynamic role binding, hierarchical (or otherwise unidirectional) activations, and enough richness to carry out structure-sensitive similarity assessments Holyoak and Hummel [2000], Sun [2002]; (2) TLMs perform unexpectedly well on human-oriented linguistic benchmarks Wang et al. [2019], and they are typically pre-trained using a lengthy process designed to embed deep semantic knowledge, resulting in a dense encoding of semantic relationships Cui et al. [2020]; (3) The pre-training process often proceeds by optimizing LMs to perform well on the MLM (masked language modeling) task, which shares more than a passing resemblance to the kind of word prediction that some
Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models
Lombardo, Pierangelo, Boiardi, Alessio, Colombo, Luca, Schiavone, Angelo, Tamagnone, Nicolò
Relatedness-based evaluation - known as intrinsic evaluation in the context of embedding-based A standard approach to evaluate a relatednessbased models - requires the construction of a dataset of model is the comparison of the semantic human annotations, which may be collected via ranking it produces with the corresponding ranking two different approaches. The former relies on a determined from human annotations. However, small group of linguistic experts to create a gold the relevance of rank mismatches may depend standard dataset, which is reliable but very expensive on the involved positions; in particular, top ranks and, due to the subjectivity of relatedness and are considered more important in many contexts, to the limited number of annotations, highly susceptible two prominent examples being content-based recommenders to bias and lack of statistical significance (De Gemmis et al., 2008, 2015; Lops (Blanco et al., 2013; Faruqui et al., 2016). The latter et al., 2011; Mladenic, 1999) and semantic matching relies on a large group of non-experts, typically (Giunchiglia et al., 2004; Li and Xu, 2014; associated with a crowdsourcing service (e.g., Amazon Wan et al., 2016). The greater significance of top MTurk, ProlificAcademic, SocialSci, Crowd-ranks compared with low ranks is actually a pretty Flower, ClickWorker, CrowdSource), it is typically common phenomenon, as it can be argued from more affordable, and it has been proven to be repeatable the attempts to overweight the former in the context and reliable (Blanco et al., 2013). of ranking correlation (Blest, 2000; Pinto da In the next sections we describe and justify a Costa and Soares, 2005; Dancelli et al., 2013; Iman protocol to construct a dataset based on semantic and Conover, 1987; Maturi and Abdelfattah, 2008; relatedness between pairs of tokens
Miltton creates a new evidence based adaptive approach to strategic marketing and communication with alliance of scientific, technological and creative talent - Miltton Group
Miltton Group is merging the worlds of technology, marketing and communications, and management consulting by connecting machine learning and predictive social dynamics research and talent with its existing experts and services. Miltton Group has joined forces with an international group of researchers, the emmy.network, Drawn from tech companies and research institutions like McGill and CERN, the networks members possess strong backgrounds in mathematics, theoretical physics, and computer science. The network is led by Dr. Jussi Westergren, a mathematician whose experience ranges from advising global companies like Intellectual Ventures to helping found organizations like DeepMind and academia.edu. Miltton Branch will be headed by Philip Roy, a programme manager from the emmy.network.